What Makes for an Effective Data Practitioner in 2024


Marck Vaisman

Senior Technical Specialist, Microsoft
marck.vaisman@microsoft.com

Adjunct Professor, Data Science, Georgetown University
marck.vaisman@georgetown.edu

These are my personal opinions and do not represent any organization

The OG Data Science Cheat Sheet (2010)

Feeling lonely?

ALL BY MYSELF

  • Data Businesspeople: especially adept at understanding the business context around their data work, have strong business acumen and are often involved in decision-making processes. They are likely to be involved in generating actionable insights that can influence business strategies.
  • Data Creatives: the broadest of the data scientist clusters, they are characterized by their strong ability to tackle new problems with innovative solutions. They have a rich skill set that can adapt to various challenges and are often the ones to come up with novel approaches to data analysis.
  • Data Developers: focused on the technical problem-solving aspects of data science with skills in software engineering and are often responsible for building the infrastructure required for data analysis such as databases, pipelines, and data processing systems.
  • Data Researchers: originating from academic or research backgrounds, bring a deep understanding of statistics and mathematics to their data science work. They are likely to be involved in developing new algorithms and statistical models.

(Harris, Murphy, and Vaisman 2013)

Skills evaluated in our survey (in alphabetical order)

  • Algorithms (ex: computational complexity, CS theory)
  • Back-End Programming (ex: JAVA/Rails/Objective C)
  • Bayesian/Monte-Carlo Statistics (ex: MCMC, BUGS)
  • Big and Distributed Data (ex: Hadoop, Map/Reduce)
  • Business (ex: management, business development, budgeting)
  • Classical Statistics (ex: general linear model, ANOVA)
  • Data Manipulation (ex: regexes, R, SAS, web scraping)
  • Front-End Programming (ex: JavaScript, HTML, CSS)
  • Graphical Models (ex: social networks, Bayes networks)
  • Machine Learning (ex: decision trees, neural nets, SVM, clustering)
  • Math (ex: linear algebra, real analysis, calculus)

  • Optimization (ex: linear, integer, convex, global)
  • Product Development (ex: design, project management)
  • Science (ex: experimental design, technical writing/publishing)
  • Simulation (ex: discrete, agent-based, continuous)
  • Spatial Statistics (ex: geographic covariates, GIS)
  • Structured Data (ex: SQL, JSON, XML)
  • Surveys and Marketing (ex: multinomial modeling)
  • Systems Administration (ex: *nix, DBA, cloud tech.)
  • Temporal Statistics (ex: forecasting, time-series analysis)
  • Unstructured Data (ex: noSQL, text mining)
  • Visualization (ex: statistical graphics, mapping, web-based dataviz)

Evolution of Data Science and in-demand skills 2008-2024 (graphic not complete)

looking to change this to interactive

In 2024, the hard truth: an overloaded definition and set of expectations leading to the Data Practitioner Brew

How it started, how it’s going

~2012

2024

Expectations vs. Reality

ooops!

Data Science is extremely complex

Data Science can be considered a:

  • Science: Data science is viewed as a continuation of empirical science, which has always been centered around data, with historical examples like Kepler’s use of data to prove Copernicus’ theory.

  • Research Paradigm: It represents a shift in research methodology, moving from a deductive approach to a more inductive approach due to the abundance of data and computational resources.

  • Research Method: Data science is used to discover new concepts, measure their prevalence, assess causal effects, and make predictions, transforming the research process.

  • Discipline: The field is inherently interdisciplinary, integrating knowledge from domains such as computer science, mathematics, statistics, and specific application domains.

  • Workflow: It involves a series of steps including data collection, exploratory data analysis, modeling, and communication of results.

  • Profession: The multifaceted nature of data science work includes aspects of science, engineering, mathematics, statistics, and domain expertise, making it a distinct professional field.

(Hazzan and Mike 2023)

Mastering it’s broadness is extremely difficult and needs time

And teaching is even harder!

Challenges

  • Broad Discipline: Data science encompasses a very broad discipline with a large body of knowledge, making it difficult to cover all necessary topics comprehensively.
  • Complex Topics: It contains complex topics such as machine learning algorithms, which require a deep understanding of mathematics and theoretical concepts.
  • Variety of Thinking Skills: Data science education requires a variety of thinking skills, including interdisciplinary thinking, computational thinking, and statistical thinking.
  • Special Professional and Organizational Skills: Beyond technical skills, data science also requires special professional skills such as teamwork, storytelling, and ethical behavior.
  • Educators’ Background: Many data science educators may not be data science majors themselves but may come from one of the disciplines that intersect to create data science, such as mathematics, statistics, computer science, or a specific application domain.

(Hazzan and Mike 2023)

WWDSD?

What would a Data Scientist do? #askingforafriend

The work we did in 2013 has over 146 citations over the last 10 years

Prior work on academic models for skills and competencies for Data Science

  1. EDISON Data Science Framework (EDSF): This framework includes a Competency Framework for Data Science (CF-DS) and a defined body of knowledge (DS-BoK). The CF-DS is developed around five major knowledge area groups: Data Analytics, Data Science Engineering, Data Management, Research Methods and Project Management, and Domain Related Competencies and Business Analytics Competencies. These areas define the explicit skills and knowledge that exemplify competence in data science.

  2. AIS IS 2010 Curriculum Guidelines: This curriculum is designed to educate and prepare graduates to enter the workforce by equipping them with knowledge and skills in three categories: IS-specific knowledge and skills, foundational knowledge and skills, and domain fundamentals. The “IS 2010” report is a collaborative effort between the Association for Computing Machinery (ACM) and the Association for Information Systems (AIS).

  3. Business Higher Education Forum (BHEF) Data Science and Analytics Competency Map: This map lists Data Science concepts and principles tiered into when and where these concepts are learned.

  4. ACM and IDASS Competencies: These documents provide a high-level list of competencies that undergraduate Data Science students should learn, with competencies directly comparable to EDISON’s CF-DS.

  5. Park City Math Institute Curriculum Guidelines for Undergraduate Programs in Data Science: This guideline does not elaborate on how to integrate the application domains knowledge into the curriculum but recognizes the importance of domain-related knowledge for practical work of a Data Scientist.

(Schmitt et al. 2023; Weiser et al. 2022; Hazzan and Mike 2023)

Skills

The difference between a skill and a competency is often related to the scope and integration of knowledge, abilities, and behaviors. A skill is typically understood as a specific learned activity that can be performed, often something that can be developed through practice. Competencies, on the other hand, are broader and include a combination of skills, knowledge, and attributes that enable someone to perform effectively in a job or situation 1 .

Competencies

Competencies are often described as more holistic, encompassing not just the ability to perform a task (skill) but also the understanding (knowledge) and the appropriate application (attributes) of that skill in various contexts. They reflect a person’s capability to apply or use a set of related knowledge, skills, and abilities required to successfully perform “critical work functions” or tasks in a defined work setting 1 .

(Weiser et al. 2022)

Thinking models

Computational

  • Cognitive Ability: It is a problem-solving process that involves designing solutions to be implemented by a person, a computer, or both.
  • Independent of Technology: Its implementation is not dependent on technology.
  • Key Skills and Processes: These include problem formulation, dividing a problem into sub-problems, organizing and logically analyzing data, representing data with models and simulations using abstraction, suggesting and assessing solutions, examining and implementing the chosen solution, and generalizing the solution to a range of problems.
  • Social Skills: Teamwork, time management, and planning and scheduling tasks are also part of computational thinking.
  • Broad Knowledge: Emphasizes the acquisition of multidisciplinary knowledge and skills that can be applied in various contexts.

Statistical

  • Understanding of Data: Involves an understanding of the essence, characteristics, and variability of real-life data.
  • Statistical Inquiry: Covers the entire process of statistical inquiry, from data collection to analysis and interpretation.
  • Use of Statistical Methods: Addresses when and how to use specific statistical data analysis methods.
  • Sampling and Inference: Refers to the nature of sampling and how to infer from samples to populations.
  • Statistical Models: Includes the use of statistical models and their application.
  • Contextual Analysis: Considers the context of a given problem when performing investigations and drawing conclusions.

Mathematical

  • Mathematical Foundations: A core competence for data science graduates, involving a deep understanding of mathematical principles.
  • Model Building and Assessment: Skills in constructing and evaluating mathematical models relevant to data science.

Application Domain

  • Domain-Specific Knowledge: Understanding the core principles and ethical considerations within a specific application domain.
  • Data Curation: Involves managing and organizing data relevant to the domain.
  • Knowledge Transference: Communication and responsibility in transferring domain-specific insights.

Data

  • Analytical Thinking: Combines computational and statistical thinking to analyze data effectively.
  • Algorithms and Software Foundation: Knowledge of algorithms and software relevant to data science.
  • Data Curation Skills: Involves the ability to curate and manage data efficiently.
  • Communication and Responsibility: Skills in communicating findings and understanding the implications of data analysis.

(Hazzan and Mike 2023; Adhikari and Jordan 2021)

Proposed skills model (graphic not complete)

(Weiser et al. 2022; Hamutcu and Fayyad 2020; Fayyad and Hamutcu 2022; Hazzan and Mike 2023; Adhikari and Jordan 2021; Cuadrado-Gallego and Demchenko 2023)

Data Analytics

most likely remove

Main competency group (task)

  1. Effectively use variety of data analytics techniques, such as machine learning (including supervised, unsupervised, semi-supervised learning), data mining, prescriptive and predictive analytics, for complex data analysis through the whole data life cycle
  2. Apply designated quantitative techniques, including statistics, time series analysis, optimisation and simulation to deploy appropriate models for analysis and prediction
  3. Identify, extract and pull together available and pertinent heteroge- neous data, including modern data sources such as social media data, open data, governmental data, verify data quality
  4. Understand and use different performance and accuracy metrics for model validation in analytics projects, hypothesis testing and information retrieval
  5. Develop required data analytics for organisational tasks, integrate data analytics and processing applications into organisation workflow and business processes to enable agile decision-making
  6. Visualise results of data analysis, design dashboard and use storytelling methods

Skills (technique)

  1. Use machine learning technology, algorithms, tools, including supervised, unsupervised or reinforced learning
  2. Use data mining techniques
  3. Use text data mining techniques
  4. General statistical analysis methods and techniques, descriptive analytics
  5. Use quantitative analytics methods
  6. Use qualitative analytics methods
  7. Apply predictive analytics methods
  8. Apply prescriptive analytics methods
  9. Use graph data analytics for organisational network analysis, cus- tomer relations, other tasks
  10. Apply analytics and statistics methods for data preparation and preprocessing
  11. Be able to use performance and accuracy metrics for data analytics assessment and validation
  12. Use effective visualisation and storytelling methods to create dash- boards and data analytics reports
  13. Use natural language processing methods
  14. Operations research
  15. Optimisation
  16. Simulation

Knowledge

  1. Machine learning supervised: decision trees, naïve Bayes classifi- cation, ordinary least square regression, logistic regression, neural networks, SVM (support vector machine), ensemble methods, others
  2. Machine learning unsupervised: clustering algorithms, principal component analysis (PCA), singular value decomposition (SVD), independent component analysis (ICA)
  3. Machine learning (reinforced): Q-learning, TD-learning, genetic algorithms)
  4. Data mining (text mining, anomaly detection, regression, time series, classification, feature selection, association, clustering)
  5. Text data mining: statistical methods, NLP, feature selection, a priori algorithm, etc.
  6. General statistical analysis methods and techniques, descriptive analytics
  7. Quantitative analytics
  8. Qualitative analytics
  9. Predictive analytics
  10. Prescriptive analytics
  11. Graph data analytics: path analysis, connectivity analysis, community analysis, centrality analysis, sub-graph isomorphism, etc.
  12. Natural language processing
  13. Data preparation and preprocessing

Tools, platforms, and applications

  1. R and data analytics libraries (cran, ggplot2, dplyr, reshap2, etc.)
  2. Python and data analytics libraries (pandas, numpy, mathplotlib, scipy, scikit-learn, seaborn, etc.)
  3. SAS
  4. Julia
  5. IBM SPSS
  6. Other Statistical computing and languages (WEKA, KNIME, Scala, Stata, Orange, etc)
  7. Scripting language, e.g. Octave, PHP, Pig, others
  8. Matlab Data Analytics
  9. Analytics tools (R/R Studio, Python/Anaconda, SPSS, Matlab, etc)
  10. Data Mining tools: RapidMiner, Orange, R, WEKA, NLTK, others
  11. Excel Data Analytics (Analysis ToolPack, PivotTables, etc)

Data Analytics Summary

Data analytics competences deal with the use of appropriate data analytics and statistical techniques on available data to discover new relations and deliver insights into research problem or organisational processes and support decision-making and cover extensive skills related to using different machine learning, data mining, statistical methods and algorithms.

  • Statistical Methods: This includes descriptive statistics, exploratory data analysis (EDA) for discovering new features in the data, and confirmatory data analysis (CDA) for hypothesis testing.
  • Machine Learning: Encompasses methods for information search, image recognition, decision support, and classification.
  • Data Mining: Focuses on modeling and knowledge discovery for predictive rather than purely descriptive purposes.
  • Text Analytics: Involves extracting and classifying information from textual sources using statistical, linguistic, and structural techniques.
  • Predictive Analytics: Centers on the application of statistical models for predictive forecasting or classification.
  • Business Analytics: Covers data analysis that relies heavily on aggregation, involving various data sources to focus on business information 2 .
  • Computational Modeling, Simulation, and Optimization: Pertains to methods used for computational modeling, simulation, and optimization .

Data Engineering

most likely remove

Main competency group (task)

  1. Use engineering principles (general and software) to research, design, develop and implement new instruments and applications for data collection, storage, analysis and visualisation
  2. Develop and apply computational and data-driven solutions to domain-related problems using a wide range of data analytics platforms, with the special focus on big data technologies for large datasets and cloud-based data analytics platforms
  3. Develop and prototype specialised data analysis applications, tools and supporting infrastructures for data-driven scientific, business or organisational workflow; use distributed, parallel, batch and streaming processing platforms, including online and cloud-based solutions for on-demand provisioned and scalable services
  4. Develop, deploy and operate large-scale data storage and processing solutions using different distributed and cloud-based platforms for storing data (e.g. Data Lakes, Hadoop, HBase, Cassandra, MongoDB, Accumulo, DynamoDB, others)
  5. Consistently apply data security mechanisms and controls at each stage of the data processing, including data anonymisation, privacy and IPR protection
  6. Design, build, operate relational and non-relational databases (SQL and NoSQL), integrate them with the modern data warehouse solutions, ensure effective ETL (extract, transform, load), OLTP, OLAP processes for large datasets

Skills (technique)

  1. Use systems and software engineering principles to organisations information system design and development, including requirements design
  2. Use cloud computing technologies and cloud-powered services design for data infrastructure and data handling services
  3. Use cloud-based big data technologies for large datasets processing systems and applications
  4. Use agile development technologies, such as DevOps and contin- uous improvement cycle, for data-driven applications
  5. Develop and implement systems and data security, data access, including data anonymisation, federated access control systems
  6. Apply compliance-based security models, in particular for privacy and IPR protection
  7. Use relational, non-relational databases (SQL and NoSQL), data warehouse solutions, ETL (extract, transform, load), OLTP, OLAP processes for structured and unstructured data
  8. Effectively use big data infrastructures, high-performance net- works, infrastructure and services management and operation
  9. Use and apply modelling and simulation technologies and systems
  10. Use and integrate with the organisational information systems, collaborative system
  11. Design efficient algorithms for accessing and analysing large amounts of data, including API to different databases and datasets
  12. Use of recommender or ranking system

Knowledge

  1. Systems engineering and software engineering principles, methods and models, distributed systems design and organisation
  2. Cloud computing, cloud-based services and cloud-powered ser- vices design
  3. Big data technologies for large datasets processing: batch, parallel, streaming systems, in particular cloud based
  4. Applications software requirements engineering and design, agile development technologies, DevOps and continuous improvement cycle
  5. Systems and data security, data access, including data anonymisation, federated access control systems
  6. Compliance-based security models, privacy and IPR protection
  7. Relational, non-relational databases (SQL and NoSQL), data warehouse solutions, ETL (extract, transform, load), OLTP, OLAP processes for large datasets
  8. Big data infrastructures, high-performance networks, infrastruc- ture and services management and operation
  9. Modelling and simulation, theory and systems
  10. Information systems, collaborative systems

Tools, platforms, and applications

  • missing

Data Engineering Summary

Data science engineering competences deal with the use of engineering principles and modern computer technologies to research, design, implement new data analyt- ics applications; develop experiments, processes, instruments, systems, infrastruc- tures to support data handling during the whole data life cycle.

The competences involve the application of engineering principles and modern computer technologies to design, implement, and manage data analytics applications and infrastructures throughout the data lifecycle 1 .

Here is a summary of the competences identified in the DSDEG section:

  • Utilize engineering principles to research, design, develop, and implement new instruments and applications for data collection, storage, analysis, and visualization.

  • Develop and apply computational and data-driven solutions to domain-related problems using a wide range of data analytics platforms, focusing on big data technologies for large datasets and cloud-based data analytics platforms.

  • Develop and prototype specialized data analysis applications, tools, and supporting infrastructures for data-driven scientific, business, or organizational workflow; use distributed, parallel, batch, and streaming processing platforms, including online and cloud-based solutions.

  • Develop, deploy, and operate large-scale data storage and processing solutions using different distributed and cloud-based platforms for storing data (e.g., Data Lakes, Hadoop, HBase, Cassandra, MongoDB, Accumulo, DynamoDB, etc.).

Data Management Skills/Competencies/Knowledge

most likely remove

Main competency group (task)

  1. Develop and implement data strategy, in particular, in the form of data management policy and data management plan (DMP)
  2. Develop and implement relevant data models, define metadata using common standards and practices, for different data sources in a variety of scientific and industry domains
  3. Integrate heterogeneous data from multiple source and provide them for further analysis and use
  4. Maintain historical information on data handling, including reference to published data and corresponding data sources (data provenance)
  5. Ensure data quality, accessibility, interoperability, compliance to standards and publication (data curation)
  6. Develop and manage/supervise policies on data protection, privacy, intellectual property Rights, IPR and ethical issues in data management

Skills (technique)

Specify, develop and implement enterprise data management and data governance strategy and architecture, including data management plan (DMP) 2. Data storage systems, data archive services, digital libraries and their operational models . Define requirements to and supervise implementation of the hybrid data management infrastructure, including enterprise private and public cloud resources and services 4. Develop and implement data architecture, data types and data for- mats, data modelling and design, including related technologies (ETL, OLAP, OLTP, etc.) 5. Implement data lifecycle support in organisational workflow, support data provenance and linked data 6. Consistently implement data curation and data quality controls, ensure data integration and interoperability 7. Implement data protection, backup, privacy, mechanisms/services, comply with IPR, ethics and responsible data use 8. Use and implement metadata, PID, data registries, data factories, standards and compliance 9. Adhere to the principles of the open data, open science, open access, use ORCID-based services

Knowledge

Data management and enterprise data infrastructure, private and public data storage systems and services 2. Data storage systems, data archive services, digital libraries and their operational models 3. Data governance, data governance strategy, data management plan (DMP) 4. Data architecture, data types and data formats, data modelling and design, including related technologies (ETL, OLAP, OLTP, etc.) 5. Data lifecycle and organisational workflow, data provenance and linked data 6. Data curation and data quality, data integration and interoperability 7. Data protection, backup, privacy, IPR, ethics and responsible data use 8. Metadata, PID, data registries, data factories, standards and compliance 9. Open data, open science, research data archives/repositories, open access, ORCID

Tools, platforms, and applications

  • missing

Data Management Summary

Data science management and governance competences deal with develop and implement data management strategy for data collection, storage, preservation and availability for further processing.

The section labeled 2.2.4 Data Management Competences, Skills and Knowledge (DSDM) in the document by Cuadrado-Gallego and Demchenko (2020) outlines the competencies required for data management within the field of data science. The competences involve the development and implementation of strategies and systems for managing data throughout its lifecycle, ensuring its quality, accessibility, and compliance with standards 1 .

Here is a summary of the competences identified in the DSDM section:

  • Develop and implement data strategy, particularly in the form of data management policy and data management plan (DMP).

  • Develop and implement relevant data models, define metadata using common standards and practices, for different data sources in a variety of scientific and industry domains 1 .

  • Integrate heterogeneous data from multiple sources and provide them for further analysis and use 1 .

  • Maintain historical information on data handling, including reference to published data and corresponding data sources (data provenance) 1 .

  • Ensure data quality, accessibility, interoperability, compliance to standards, and publication (data curation) 1 .

Research Methods and Project Management

most likely remove

Main competency group (task)

Data science research methods and project management competences create new understandings and capabilities by using the scientific method (hypothesis, test/ artefact, evaluation) or similar engineering methods to discover new approaches to create new knowledge and achieve research or organisational goals.

  1. Create new understandings by using the research methods (including hypothesis, artefact/experiment, evaluation) or similar engineering research and development methods
  2. Direct systematic study towards understanding of the observable facts, and discover new approaches to achieve research or organisational goals
  3. Analyse domain-related research process model, identify and analyse available data to identify research questions and/or organisational objectives and formulate sound hypothesis
  4. Undertake creative work, making systematic use of investigation or experimentation, to discover or revise knowledge of reality, and use this knowl- edge to devise new applications, contribute to the development of organisational objectives
  5. Design experiments which include data collection (passive and active) for hypothesis testing and problem-solving
  6. Develop and guide data-driven projects, including project planning, experiment design, data collection and handling.

Skills (technique)

  1. Use research methods principles in developing data-driven appli- cations and implementing the whole cycle of data handling
  2. Design experiment, develop and implement data collection process
  3. Apply data lifecycle management model to data collection and data quality evaluation
  4. Apply structured approach to use case analysis
  5. Develop and implement research data management plan (DMP), apply data stewardship procedures
  6. Consistently apply project management workflow: scope, planning, assessment, quality and risk management, team management

Knowledge

  1. Research methods, research cycle, hypothesis definition and testing
  2. Experiment design, modelling and planning
  3. Data life cycle and data collection, data quality evaluation
  4. Use case analysis: research infrastructure and projects
  5. Research data management plan (DMP) and data stewardship
  6. Research data management plan (DMP) and data stewardship
  7. Project management: scope, planning, assessment, quality and risk management, team management

Tools, platforms, and applications

  • missing

:::

Research Methods Summary

needs to be completed

Data science research methods and project management competences create new understandings and capabilities by using the scientific method (hypothesis, test/ artefact, evaluation) or similar engineering methods to discover new approaches to create new knowledge and achieve research or organisational goals.

  • missing

What’s missing from EDISON?

needs to be completed

  • Generative AI
  • MLOps
  • DevOps

Here we go again…

Slide on 70/20/10 model

to be completed

Call to action

What are you going to do to help fix this?

  1. Standardize Roles: Adopt a simplified and anchor categorization of job roles with clear definitions and expectations to reduce confusion and align understanding across the industry.

  2. Utilize Frameworks: Refer to established frameworks such as the IADSS Data Science Knowledge Framework to converge on the body of knowledge specification and ensure consistency in role definitions.

  3. Skills-Based Assessment: Implement objective, skills-based assessments to standardize the evaluation of data science professionals and ensure alignment with role requirements.

  4. Industry-Specific Classifications: Extend the work to include industry-specific role classifications that cater to the unique needs and contexts of different sectors.

  5. Benchmarking and Comparison: Compare and contrast standardized role definitions with actual industry practices, such as analyzing LinkedIn job posts, to ensure relevance and applicability.

  6. Communication and Education: Communicate the process and expectations clearly to all stakeholders, including hiring managers, executives, educators, and aspiring data science professionals, to align expectations.

  7. Continuous Role Evolution: Recognize that the field is dynamic, with new titles and roles emerging, and be open to further subclassification and evolution of data scientist roles.

  8. Continuous Tooling Evolution: Adopt new tools and techniques as needed and build a strong foundation on industry-standard tools.

(Fayyad and Hamutcu 2022)

References

Adhikari, Ani, and Michael I. Jordan. 2021. “Interleaving Computational and Inferential Thinking in an Undergraduate Data Science Curriculum.” Harvard Data Science Review, March. https://doi.org/10.1162/99608f92.cb0fa8d2.
Cuadrado-Gallego, Juan J., and Yuri Demchenko. 2023. Data Analytics: A Theoretical and Practical View from the EDISON Project. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-39129-3.
Fayyad, Usama, and Hamit Hamutcu. 2022. “From Unicorn Data Scientist to Key Roles in Data Science: Standardizing Roles.” Harvard Data Science Review, July. https://doi.org/10.1162/99608f92.008b5006.
Hamutcu, Hamit, and Usama Fayyad. 2020. “Toward Foundations for Data Science and Analytics: A Knowledge Framework for Professional Standards.” Harvard Data Science Review, June. https://doi.org/10.1162/99608f92.1a99e67a.
Harris, Harlan D., Sean Patrick Murphy, and Marck Vaisman. 2013. Analyzing the Analyzers: An Introspective Survey of Data Scientists and Their Work. First edition. Beijing: O’Reilly.
Hazzan, Orit, and Koby Mike. 2023. Guide to Teaching Data Science: An Interdisciplinary Approach. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-24758-3.
Schmitt, Karl R. B., Linda Clark, Katherine M. Kinnaird, Ruth E. H. Wertz, and Björn Sandstede. 2023. “Evaluation of EDISON’s Data Science Competency Framework Through a Comparative Literature Analysis.” FoDS 5 (2): 177–98. https://doi.org/10.3934/fods.2021031.
Weiser, Orli, Yoram M. Kalman, Carmel Kent, and Gilad Ravid. 2022. “65 Competencies: Which Ones Should Your Data Analytics Experts Have?” Commun. ACM 65 (3): 58–66. https://doi.org/10.1145/3467018.

Technical Notes (and more skills…)

  • Images generated with the DALLE3 model on Azure OpenAI
  • Content summarization performed by:
    • Chunking PDFs
    • Embedding with text-embedding-ada-002
    • Using AzureAI hybrid search
    • Summarizing with GPT4
  • Presentation written in Quarto and rendered as reveal.js and presented in browser

Let’s talk about education and industry partnerships!

marck.vaisman@microsoft.com

marck.vaisman@georgetown.edu

wahalulu

marckvaisman

wahalulu